Hierarchical data generator based on tree-structured stick breaking process for benchmarking clustering methods
نویسندگان
چکیده
A new variant of Hierarchical Cluster Analysis is gaining interest in the field Machine Learning, called Object Hierarchy. Being still at an early stage development, lack tools for systematic analysis Hierarchies inhibits further improvement this concept. In paper we address issue by proposing a generator synthetic hierarchical data that can be used benchmarking Hierarchy generation methods. The article presents thorough empirical and theoretical provides guidance on how to control its parameters. conducted experiments show usefulness capable producing wide range differently structured data. Furthermore, datasets represent most common types hierarchies are generated made available public benchmarking, along with developed (http://kio.pwr.edu.pl/?page_id=396).
منابع مشابه
Tree-Structured Stick Breaking for Hierarchical Data
Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the component...
متن کاملTREE-STRUCTURED STICK BREAKING PROCESSES FOR HIERARCHICAL DATA By Ryan P. Adams, Zoubin Ghahramani and Michael I. Jordan
Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the component...
متن کاملLogistic Stick-Breaking Process
A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is als...
متن کاملHIERARCHICAL DATA CLUSTERING MODEL FOR ANALYZING PASSENGERS’ TRIP IN HIGHWAYS
One of the most important issues in urban planning is developing sustainable public transportation. The basic condition for this purpose is analyzing current condition especially based on data. Data mining is a set of new techniques that are beyond statistical data analyzing. Clustering techniques is a subset of it that one of it’s techniques used for analyzing passengers’ trip. The result of...
متن کاملKernel Methods for Tree Structured Data
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Sciences
سال: 2021
ISSN: ['0020-0255', '1872-6291']
DOI: https://doi.org/10.1016/j.ins.2020.12.020